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Evaluation de certaines constantes hypergeometriques 
efficace en temps et en espace 

Resume : Les meilleurs algorithmes actuels pour revaluation numerique de constantes 
hypergeometriques comme £(3) avec d chiffres de precision ont une complexity en temps 
0(M(d) log 2 d) et en espace 0{d\ogd) ou 0(d). Dans la lignee de travaux de Cheng, Gergel, 
Kim and Zima, nous presentons un nouvel algorithme de meme complexity asymptotique, 
mais plus efficace en pratique. Notre implementation de cet algorithme est plus efficace 
que les codes existants pour les calculs de n, et nous annongons un nouveau record de 2 
milliards de chiffres decimaux for £(3). 

Mots-cles : Constantes hypergeometriques, scindage binaire, crible 
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1 Introduction 

In this article, we are interested in the high-precision evaluation of constants defined by 
hypergeometric series of the form 

oo n— 1 / .n 

n=0 i=0 yW 

where a, p and q are polynomials with integer coefficients. We shall also assume, without 
loss of generality, that p and q are coprime, have no nonnegative integer as a zero and that 
p(n)/q(n) tends to a constant < c < 1 when n goes to infinity. 

Under those assumptions, the series converges, and we can compute an approximation 
to the constant by truncating the series, i.e., by computing 

AT-l n-1 /.x 

i><">n§ c> 

n=0 i=0 ^ v ' 

for an appropriately chosen N = 0(d) with d being the number of decimal digits desired. 
The high-precision evaluation of elementary functions and other constants — including 
the exponential function, logarithms, trigonometric functions, and constants such as 7r or 
Apery's constant C(3) — is commonly carried out by evaluating such series [101 112] . For 
example, we have 

' l2Vr r 545140134n+ 13591409 (&n)\ 
^ ' 640320 3 -+ 3 / 2 (3n)!n! 3 



7T 

or 



ra=0 



oo 



2C(3) = ^(-l)"(205n 2 + 250n + 77) 1 5 . (4) 

n=0 ' 

Assuming that g(n) has size O(logn), the special form of the series ([2]) implies that the 
common denominator YliLo 2 nas a relatively small size of O(iVlogiV). An approach 
commonly known as "binary splitting" has been independently discovered and used by 
many authors in such computations [H El [3l HJ [9l [101 021 EE]- In binary splitting, the 
use of fast integer multiplication yields a total time complexity of 0(M(d log d) log d) = 
0(M(d) log 2 d), where M(t) = 0(t logt loglogt) is the complexity of multiplication of two 
t-bit integers [15] . The 0(d\ogd) space complexity of the algorithm is the same as the size 
of the computed numerator and denominator. 

The numerator and denominator computed by the binary splitting approach typically 
have large common factors. For example, it was shown that in the computation of 640,000 
digits of C(3)> the size of the reduced numerator and denominator is only 14% of the size 
of the computed numerator and denominator. This suggests possible improvements of 
the method, by avoiding the unneeded computation of the common divisor between the 
numerator and denominator. Several approaches have already been taken in that direction. 
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In particular, [6] suggests to use a partially factored form for the computed quantities, in 
order to efficiently identify and remove common factors, and pT] goes further by explicitly 
constructing the common divisor and dividing out the numerator and denominator. 

The present work builds on top of this strategy and uses a fully factored form in the 
binary splitting process. We show that the fully factored form yields a time complexity of 
0{M{d) log 2 d), and space complexity 0(d). This matches the complexity of the standard 
approaches, but provides a practical speedup confirmed by experiments. Our method 
appears to be noticeably faster than other optimized binary splitting implementations 
aimed at the computation of digits of 7r or other constants. We also show in this article 
that the exact set of series that are amenable to efficient computation using the fully 
factored form is characterized by a simple criterion: only the series where p(n) and q(n) 
are products of linear factors exhibit the large common factor that was observed in the 
computation of C(3)- Therefore our attention is restricted to that case. 

This article is organized as follows. Section [2] recalls the binary splitting algorithm, 
and reviews the different approaches for improving the practical efficiency of the method. 
Section [3] examines in detail the size of the reduced fraction computed by the binary 
splitting algorithm. Section H] presents the alternative of using a fully factored form in 
the binary splitting approach. In Section [5j the analysis of the algorithm is performed. 
Section [6] concludes with experimental data, and a comparison with other programs. 

2 The binary splitting approach and its variants 

We give a brief description of the binary splitting approach here, following the notations 
from [6]. 

Our approximation to the constant to be evaluated can be written S(0, N) where for 
< n\ < n 2 we define 



Letting P(n 1 ,n 2 ) = U7=n\ K n )> Q( n ^ n a) = n^L^i ^( n ) ' and ? ^2) = S(n 1 ,n 2 )Q(n 1 ,n 2 ), 
we have for n\ < m < n 2 , with T(n, n + 1) = a(n)p(n): 



This leads to a recursive algorithm to evaluate T(0, N) and Q(0, N), which corresponds to 
the evaluation of a product tree [lj. One then deduces S(0, N) by a division. 

Since p(n)/q(n) tends to c, the tail S(N, 00) of the series is bounded by 0(c N ). There- 
fore, to compute the constant S(0,oo) with error e, we need N = + 0(1) terms: 
the number of terms is proportional to the number d of digits of accuracy desired. The 




P(ni,n 2 ) 
Q(n h n 2 ) 
T(n ll n 2 ) 



P(ni, m)P(m, n 2 ) 
Q(n 1 ,m)Q(m } n 2 ) 

T(rii, m)Q(m, n 2 ) + P(ni, m)T(m, n 2 ). 
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corresponding product tree has height log A/", where the leaves have O (log AT) bits and 
hence the root has 0(N log N) bits. The total evaluation of the truncated series costs 
0(M(dlogd) log cf) = 0(M(d) log 2 d) with the best-known multiplication algorithms. 

Although some constants such as 7r and log 2 can be computed to d digits with bit com- 
plexity of 0(M(d) logd) using the Arithmetic Geometric Mean (AGM) [2], the 0{M(d) log 2 d) 
binary splitting algorithm is still competitive up to billions of digits. For example, D. V. and 
G. V. Chudnovsky held the ir record using Formula ([3]) with 8 billion digits in 1996 |10j . 

2.1 Improvements of the binary splitting method 

As mentioned earlier, the binary splitting method suffers from the drawback that the 
fraction T/Q has size 0(dlogd), while an accuracy of only d digits is required. In [10] . 
the authors circumvent this problem by limiting the precision of the intermediate results 
to 0(d) digits. This is used by the PiFast program [10] and results in the same time 
complexity as the binary splitting method but a reduced space complexity of 0(d). This 
truncation, however, implies that the exact reduced fraction is not computed, so that it 
is not easy to extend the computation to higher precision using results already computed. 
Further, the truncation only operates on the top levels of the computation tree, since below 
a depth of order O(loglogd), the computed integers have size 0(d) anyway. Below this 
depth, the computations performed by the PiFast program are expected to be exactly the 
same as in the classical algorithm above. 

Since in the course of the computation of digits of C(3), T and Q have been found to 
share a large number of common factors, Cheng and Zima [6] worked towards efficiently 
removing some of these factors from the computation. For this purpose, a partially factored 
representation was introduced in the binary splitting process. Subsequently, Cheng, Gergel, 
Kim, and Zima [5] applied modular computation and rational number reconstruction to 
obtain the reduced fraction. If the reduced numerator and denominator have size 0(d), the 
resulting algorithm has a space complexity of 0(d) and the same time complexity as binary 
splitting. By carefully analyzing the prime divisors of the numerator and denominator of 
01]), it was shown in [5] that the size of the reduced fraction for ((3) is 0(d); it was noted 
that the analysis was in fact related to using the partially factored representation with 
all possible prime factors in the binary splitting process. However, it was not practical 
to use so many primes in the partially factored representation because it was expensive 
to convert from standard representation by factoring. Additionally practicality of the 
algorithm depends on the availability of the implementation of the asymptotically fast 
rational reconstruction algorithm (for example, see [T6]). 

We also mention the gmp-chudnovsky program [T7], which uses the binary splitting 
method to compute digits of ir using Formula ([3]). Two modifications are made to the 
classical method described above. First, integers P(n\,ri2) and Q(n\,ri2) are handled 
together with their complete factorization. This makes it possible to quickly compute the 
gcd of P(ni, m) and Q(m, n 2 ) by merely comparing the factorizations. Afterwards, the gcd 
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is divided from both numbers. The fraction T/Q is therefore reduced. It should be noted 
that gmp-chudnovsky still works with expanded integers P, Q, and T (albeit reduced). 

The second specificity of the gmp-chudnovsky program lies in the way the leaves p(n) 
and q(n) are computed. Since the factorization of these numbers is sought, an optimized 
sieve table is built. Formula ([3]) implies that the integers to be factored are bounded by 
6N, where N is the number of computed terms. A table of |_^J entries is built, with the 
i-th cell containing information on the smallest prime divisor of 2i + 1, its multiplicity, and 
the integer j such that 2j + 1 is the cofactor. Such a table can be computed very efficiently 
using a modified Eratosthenes' sieve. This represents a tiny part of the total computing 
time. Unfortunately, this sieve table is also an impediment to large scale computations, in 
that it has a space complexity of O(dlogd). 

3 Size of the reduced fraction 

Cheng, Gergel, Kim and Zima showed in [5] that for Formula (j4j) giving C(3), when removing 
common factors between T and Q, the reduced fraction T/Q has size 0(d) only. We show 
here that this fact happens for a large class of hypergeometric constants. 

Understanding when the size of the fraction reduces to 0(d) is closely linked to a study 
of the prime divisors of the values p(i) and q(i). Indeed, the fraction being significantly 
smaller than its expected 0(d\ogd) size means that there are large cancellations at many 
primes; it thus means that the primes occurring in n^^p^) an< ^ are m ostly 

the same, and with the same multiplicities. 

We first notice that since p(n)/q(n) tends to c > when n — > oo, this implies that p 
and q have the same degree. For a polynomial p, we use the notation lc (p) and disc (p) to 
denote the leading coefficient and the discriminant of p, respectively. If p is an irreducible 
polynomial and i a prime (or prime power) coprime to A(p) := lc (p) disc (p) , we shall 
denote pi(p) the number of roots of p modulo I. If p = Yli=iPTi anc ^ ^ ^ s coprime to 
A(p) := Yli=iA.(pi), we shall define pi(p) = $^ i=1 e.iPt(pi)i which is still the number of 
roots of p, counted with multiplicities. 

The following lemmata lead to estimates of the ^-valuation and the size of the quantities 
Q(ni,ri2), T(ni,n 2 ) and their common divisors when the summation range [ni,ri2] grows. 

Definition 1. Let J\f Pt t(ni,n 2 ) be the number of integer roots of p(-) mod £ in [ni,n 2 [: 



Lemma 1. Let p be a polynomial, and I a prime (or prime power) coprime to A(p). Then, 



Np,t(ni,n 2 ) := #{x G [n x , n 2 [/p(x) = mod £}. 



Af p ,e(n ll n 2 ) 



Peip) 

i 



(n 2 -m)+0(l) 



where the implied constant in 0(1) depends on p only. 
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Proof. The roots of p modulo £ in the interval [ni,7i2[ are exactly integers congruent to 
one of the pe{p) roots of p in [0, i — 1]. The Lemma follows, the precise error term being 
at most pi{p) < degp. □ 

Definition 2. For an integer m, let ve(m) be the l-valuation of m, i.e., the largest integer 
j such that P divides m. 

Lemma 2. Let £ be a prime not dividing A(p). Then, 

Vt{P{m, n 2 )) = j—^(n 2 -m)+0 [j—jr J ■ 

Proof. We shall assume without loss of generality that p is irreducible, since by our def- 
inition of pe(p), the result in the general case will follow by linearity. The £- valuation of 
P(ni, n 2 ) is exactly 

v i (P(n 1 ,n 2 )) = } j M p> ei{n 1 ,n 2 ). 

Since £ does not divide A(p), Hensel's Lemma shows that pi(p) = Ptk{p) for all k > 1. 

Further, there exists a constant 7 such that the inequality \p{x)\ < holds over 
the interval [711,712 [■ We then have A/^p(ni, n 2 ) = for j > J p := 7 °^" 2 , and also 
£-J P _ O^- 1 ). Lemma [1] yields for < rii < n 2 : 

^(P(ni,n 2 )) =P/(p)(n2-ni) (j^jj + (j^f) ' 
The statement follows. 

□ 

We need to control (though in a rather rough way) what happens for primes dividing 
A(p). The following weaker lemma is sufficient; its proof is very close in spirit to that of 
Lemma 2. 

Lemma 3. For any prime £, we have 

v e {P{n 1> n 2 )) = 0{n 2 - n x ), 
where the O- constant depends on p only. 

Proof. Let 77, . . . , be the roots of p in Q e , repeated according to their multiplicities. We 
have 

k 

ve(p(x)) = v e (lc (p)) + } j Vj(x - Tj). 

3=1 
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Hence, 

ri2 — 1 k 

Vt{P(nx, n 2 )) = (n 2 - ni)^(lc (p)) + ^ 2J M x ~ r j) 

x=ni j=l 

k n%— 1 

= ^ ^ ~ r j) + (™2 ~ 

j=l x=ni 

Now, as in the proof of Lemmata 1-2, for each j, the number of x G [ni,n 2 [ such that 
x — rj = mod is 0((n 2 — ni)/t). Hence, 

n 2 -l 

£=711 

from which our claim follows. □ 

We can now state our main theorem regarding the size of the fraction T/Q in reduced 
form: 

Theorem 1. For i a prime not dividing A(pq), one has 

u/(gcd(T(ni, n 2 ), Q{ni, n 2 ))) > — (n 2 - m) + O \ T^J J • 

Proof. For < n x < k < n 2 , put r(ni, A;, n 2 ) = a(k)P(ni,k)Q(k,n 2 ). Then we have 

ri2 — 1 

T(n 1 ,n 2 ) = X T(m,k,n 2 ). 

n=ni 

Applying Lemma [21 we see that 

^rK fc, n 2 )) = v e (a(k)) + f^(* - n,) + M^(n 2 - *) + O (^) . 
As such, 

^(T(ni,n 2 )) > mm v t {r(ni, k, n 2 )) > -. — [n 2 -ni)+0[- . 

n 1 <k<n 2 l — l \ log I J 

Joined with Lemma [2] for Vf(Q(ni,n 2 )), this gives the desired statement. □ 
Note also that it is clear from the proof that this lower bound is generically sharp. 
Corollary 1. The following holds: 




Time- and Space- Efficient Evaluation of Some Hypergeometric Constants 



9 



• Ifp(n) and q(n) have only linear irreducible factors, the fraction T(n x , n 2 )/Q(ni, n 2 ) 
in reduced form has size 0(min(n 2 , (n? — n\) log n 2 )). 

• Otherwise, heuristically, as soon as n x = o{n 2 ), it is of size 6(n 2 logn 2 ). 

Proof. In the first case, the second part of the O-estimate is the trivial estimate for the 
size of Q(ni, n 2 ). Since S(ni, n 2 ) = q["*'^) is a partial sum of a converging series, we have 
T(n%, n 2 ) = 0{Q{n%, n 2 )). Therefore this second part holds also for the size of the reduced 
fraction. 

The fact that T(n x ,n 2 ) = 0(Q(ni, n 2 )) also implies that the size of the reduced fraction 
i s log; Qjmm) , Q(i) 

Now, we have 

log — ttt^t-^^vt^ tt = Y] [^(Q(ni,n 2 )) -^(gcd(T(m,n 2 ),Q(ni,n 2 )))]log£. 

gcd(T(7ii,n 2 ),Q(7ii,na)) e ^ me 

However, since q = Yli=i iT ■, we can discard in this sum primes larger than C(q)n 2 for 
some constant C(q) such that < C(q)n 2 for all i,x G [ni,n 2 — 1] (recall that q has 

only linear factors). Further, the finitely many primes dividing the discriminant A of a 
prime factor of pq contribute for 0(n 2 —n{) by Lemma[3j The logarithmic height therefore 
rewrites as: 



( -min(p^(p),p^(g))]logf + 0(logn 2 ) j +0(n 2 -n x ). 

£<C(q)n 2 J prime ^ ' 

Under our assumptions, we have pi(p) = pe(q) = degp = degq, hence this is also 

0(logn 2 ) + 0{n 2 -m) = 0{n 2 ). 

^<C(g)n2, i prime 
(£,A(pg))=l 

We now turn to the case where q has irreducible factors of degree greater than 1. In 
this situation, it is preferable to compute the size of the reduced fraction by subtracting 
the log of the gcd to the asymptotic value of \ogQ{rii,n 2 ) = (n 2 logn 2 — n x logni) degg + 
0{n 2 — rii) = (n 2 — ni) degglogn 2 + 0{n 2 — n x ), since 

n 2 log n 2 - ni log n x = (n 2 - n x ) \ogn 2 + n x \ogn 2 /n x 

(n 2 - m) logn 2 + 0(n 2 - m). 

Again, write 

gcd(r(ni,n 2 ),Q(ni,n 2 )) = 2J ^(g cd (T(m, n 2 ), Q{n u n 2 ))) logl 

£ prime 
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Heuristically, almost only primes of the order of magnitude of at most n 2 — ri\ should 
appear both in T and in Q. Joined with the heuristic remark following Theorem [TJ this 
means that we expect the size of the gcd to be of the order of 

^ mm(p e (p),p e (q)) , 
(n 2 -m) 2^ JZT\ \ogl + 0(n 2 ). 

£<ri2—ni, £ prime 

Recall degp = degg, and let IK be the splitting field of pq. Denote by V the set of 
primes I such that pe{p) = Pi(q) = deg(p)(= deg(g)). 
The size of the gcd is 

deg q — 1 



< (nz-ni) £_i lo S 



£<n,2— m,t prime 
£0> 



+(na-ni) £ ^flog^ + 0(n 2 ) 



£<n2—ni,l prime 

£ev 

= (n 2 -n 1 )log(n 2 -n 1 )(degq-l) + (n2-ni) £ ^^ + 0(n 2 ). 

£<ri2—n\,£ prime 

tev 

The last sum over primes evaluates to 

log £ v-^ 1°S ' 



E rzi = E 



^<7i2— ni,£ prime ^<ri2 — "li ^ prime 

tep £eV 



which, by classical analytic number theory arguments — notice that the primes of V are 
exactly those for which the Artin symbol (£, K/Q) equals 1 — , is 



log ^ 2 - ni) +o(i). 



Thus, the size of the gcd is at most 

{n 2 - m) \og{n 2 - m) ^deg q - 1 + 1 ^ + 0{n 2 ). 

Hence, under this heuristic, we see that the size of the reduced fraction can be 0(n 2 ) 
when rii = o(n 2 ) only if [K : Q] = 1, which means that p and q are products of linear 
factors. □ 

Remark. In the proof of Corollary 1, we show that if p and q have linear factors only, the 
size of the gcd is3 (n 2 — rii) log(n 2 — rii) deg q + 0(n 2 ). As long as (n 2 — rii) log(n 2 — rii) = 



taking into account only primes < n 2 — n±, but any compensation occurring for significantly larger 
primes is bound to be coincidental. 
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Algorithm FastEval(ni, 712, B). 

if (ni == tt-2 — 1) { /* leaf computation */ 

P = FullFactor (p(m), B) ; 

Q = FullFactor (.q(ni) , B) ; 

T = FullFactor (p(ni), B) • a(m); 
} else { 

m = L^l; 

(Pl,Ql,Ti) <— FastEval (ni , m, B) ; 
(P2,Q2,T 2 ) <— FastEval (m, re 2 , P) ; 
P = PartialMult(Pi, P 2 ) ; 
Q = PartialMultCQi, Q 2 ) ; 

T = PartialAdd(PartialMult«52, Pi), PartialMult (Pi , T 2 )); 
} 

Figure 1: Binary splitting using factored representation 

0(712), the size of the gcd is negligible with respect to the size of Q, which means that 
there is almost no compensation between numerator and denominator of the fraction. 
Thus, compensations start to appear in the evaluation tree only as soon as n 2 — n\ is of 
the order of n 2 / logn 2 . 

4 Using a Fully Factored Representation 

We extend in this section the "partially factored representation" of (6J Section 4] to a "fully 
factored representation" for P, Q, and T. 

4.1 Factored representation of integers 

We consider a set B of primes. (In practice, it will consist of all primes needed to completely 
factor p{n) and q(n) up to n = N — 1.) A factored representation over B of an integer z 
is an expression of the form: 

z = Y\p ap ■ r , 

peB 

where a p G N and r G Z. The integer z is represented by the data (({p,a p )) p ,r). For 
efficiency purposes, the representation skips primes p such that a p is zero. Note that 
we do not impose that primes in B do not divide the cof actor r, so different factored 
representations may correspond to the same integer, like 2 2 -3-7 and 2-3-14 over B = {2, 3}. 
When the cofactor r equals 1, we have the (unique) fully factored representation of z. 

A binary splitting method using factored representations of integers can be written as 
in algorithm FastEval (see Fig. [TJo We need to define the three operations FullFactor, 



2 In algorithm FastEval, we do not factor the a{n\) term from T, which will not in general share 
(additional) common primes with the denominator Q. 
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PartialMult and PartialAdd. FullFactor computes the full factorization of a given integer 
over the factor base B, and is obtained by sieving (see below). Further, we define: 

PartialMult(JJp Qp • r, TTjA • s) = JJp Q f +/3 f ■ (rs). 

peB peB peB 

PartialAdd( JJ p Qp 

peB 

where 7 p = min(o;p, (5 p ). 
4.2 Leaf computations 

We have P(n,n + 1) = p(n), Q(n,n + 1) = q(n), T(n,n + 1) = a(n)p(n). The algorithm 
in Figure [D requires computing the fully factored representation of these quantities. This 
corresponds to the leaves of the evaluation tree. 

In order to expect an improvement from the use of the fully factored representation, 
we must make sure that the gain is not offset by the complexity of the leaf computations. 
In order to perform this step efficiently, we use a standard window sieving method. 

As mentioned in the introduction, we assume here that p(n) and q(n) are polynomials 
with linear factors only — like in the case of Formulae (J3j) or (J3J). Without loss of generality, 
we illustrate our sieving procedure with the computation of the fully factored representation 
of the quantity q(n) from Formula (j3J). This is equivalent to the factorization of 2n + 1. 
The sieving produces simultaneously the factorization of all the consecutive odd integers 
in a range [2ni + 1, 2(ni + W) + 1 [, where W is an arbitrary integer. We proceed as follows. 

1. For each odd prime (or prime power) £ such that 3 < £ < 2N, we compute the 
smallest value %n such that 

2(ni +i t ) + 1 = mod t. 

2. Sieve using the procedure in Figure [2j 

Besides this description, the important observation is that the next set of initialization 
values %n for the computation of q{ri\ + W),...,q(ni + 2W — 1) does not have to be 
computed: the code in Figure [2] has already updated these values correctly. 

The translation of this scheme to other factorizations than that of 2n + 1, as long as 
we stick to linear polynomials, is straightforward. 

5 Analysis of the algorithm 

5.1 Cost of sieving 

Because p(n), q(n) are assumed to have linear factors only, their prime divisors are bounded 
by cn for some constant c, thus all p(n),q(n) for n < N can be completely factored over 



nv 

p£B 




r +Y\_P 

peB 
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Algorithm Sieve(ni, n% + W — 1) 
f actored_representation t[1^] 
for all primes I < 2N 

i = in 

while {i < W) { include f in r[i] ; i = i + £ } 

it = i — W 

for all powers l k of I, with i k < 2N 

while (i < W) { increase by one the multiplicity of I in r[i] ; i = i + £ k 

} 

i t k = i -W 

Figure 2: Pseudo-code for sieving 

a set of 0(N/ log AQ primes or prime powers. Since the initialization of the sieve only has 
to be done once, the computation of the ie values is trivial. In total, the sieving code in 
Figure [2] performs 0(N/£) sieve updates for each prime (or prime power) i. The number 
of times the sieve procedure is called is 0(N/W), and each time all of the 0(N/ log N) 
primes or prime powers are scanned. This yields a time complexity for sieving which is 
0(iV log log N + N 2 /(W log N)). The space complexity for sieving is at most 0(W log N). 
There is some freedom in the choice of W, but it must clearly be between 0(N/ (log N) 4 ) 
and 0(N/logN), so that the time and space complexities remain below 0(dlog 3 d) and 
0(d), respectively. 

5.2 The recursion 

The factor base B consists of all possible prime divisors of p(n) or q(n), which means 
0(N/ log N) primes. The integers P and Q are therefore always fully factored. Computing 
the product P(ni,n 2 ) = P(ni,m)P(m,n2) thus just consists in adding the prime expo- 
nents in the lists of factors. If the factored representations of P(ni,m) and P(m, n 2 ) have 
respectively l\ and / 2 elements, this can be done in 0(li + / 2 ) operations. 

At level k — the leaves corresponding to level — the values of P or Q are bounded 
by 0((N dcgp ) 2fc ), thus have O(2 fc logA0 bits. On the other hand, let I be the number of 
different prime factors in the representation of P (counted with multiplicities), then P > 2 l , 
thus P has Q(l) bits. It follows that at level k, we have I = 0(2 k logN). (We also have 
I = O(j^y) since there are that number of primes in the factor base B.) 

The total cost of computing P and Q is thus bounded by Y^k=o w(^ k l°§-^0 = 0(<ilog 2 d). 

As concerns T, if we could prove that its non-factored part is always logA^ times 
smaller than the factored part, we would get a complexity of 0(M(d) logrf) for T. Indeed, 
at level k, the non-factored part of T would have 0(2 k ) bits, thus the cost of computing 
it would be 0(M(2 k )), since it is obtained from a sum of products T(n 1 ,m)Q(m,n 2 ) + 
P(ni,m)T(m,n2), where no cancellation occurs. Thus the total cost for the non-factored 
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part would be Ylklo = 0(M(d) log d). (The analysis for the factored part is 

similar to that for P and Q.) 

Unfortunately, the above property — the non-factored part of T is log N times smaller 
than its factored part — is only true near the root of the product tree, where common 
factors cancel between P and Q. Thus the computation of T costs 0(M(d) log d) (unless 
we can do better). 

6 Experimental Results 

In this section we investigate the benefit of using the fully factored representation for the 
purpose of computing the sum of series such as (jSJ) or (j4j). 

Compared to the sieve table of the gmp-chudnovsky program mentioned in Section [2J 
we address the problem of the leaf computation in a different way. The sieving procedure 
described in Section H] keeps a space complexity of the order of 0(d). We found that our 
sieving procedure was competitive with the sieve table from the gmp-chudnovsky program. 

The fully factored representation is an asset as soon as compensations between prime 
factors start to appear. However, this is not encountered at the very lowest levels of the 
computation tree, near the leaves. Quite naturally, a cut-off level appears between the use 
of the factored representation and the use of the fully expanded integer values. At the 
lowest levels of the tree, we use the same approach as the gmp-chudnovsky program. For 
P and Q, both the expanded integer and its factorization are kept. The integer component 
is dropped above a certain height in the tree. The remark concluding Section [5] suggests 
that the switch from one algorithm to the other be done when n<i — n% is of the order of 
Io " rc 2 ■ However the running time is the measurement here, and the precise cut-off is chosen 
by trial and error. 

Another implementation note concerns the PartialAdd operation mentioned in Sec- 
tion HI and also the final expansion of T and Q from the factored form to a flat integer. For 
this purpose, we use the same kind of algorithm as mentioned in [H] for the computation 
of n\. 

We implemented our algorithm in C++ with the GMP and MPFR libraries [TTIIT]. We 
modified the GMP library with an improved FFT multiplication code [8]. We compare our 
results with the two programs mentioned in Section El which compute digits of tt using 
Formula ([3]). For the purpose of comparison, we focus on the time for the evaluation of 
the fraction T/Q. Because our program and the gmp-chudnovsky program share the GMP 
library as a common backbone, we are convinced that this comparison gives the most 
meaningful results. Table [1] gives the relative time spent in the binary splitting process for 
our program and for the gmp-chudnovsky program, measured on a 2.4Ghz Opteron CPU. 
The gain seems to grow slightly with the number of digits computed. 

Table [1] also mentions timings of our program against the PiFast program. The com- 
parison has been made on a 3 GHz Pentium 4 CPU (hence the different timings). The 
lack of source code access for PiFast mandates some caution for the interpretation of the 
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Opteron, 2.4Ghz 



digits 


our code 


gmp 


-chudnovsky 


ratio 


2 25 


60s 




61s 


0.98 


2 26 


136s 




147s 


0.93 


2 27 


322s 




352s 


0.91 


2 28 


768s 




853s 


0.90 


2 29 


1868s 




2059s 


0.91 


2 30 


4654s 




5328s 


0.87 




Pentium 4, 


3Ghz 




digits 


our code 


PiFast 


ratio 


28 x 10 6 


127s 




135s 


0.94 


40 x 10 6 


192s 




206s 


0.93 


57 x 10 6 


291s 




323s 


0.90 



Table 1: Comparison with the gmp-chudnovsky and PiFast programs for computing digits 
of 71". 

timings, since the operating system overhead seems to be included. Nonetheless, it seems 
that our fully factored binary splitting provides a growing improvement. 

Finally, we used our program to establish a new record-size computation for 2 billion 
decimal digits of C(3)- The series of Formula (jlj) was evaluated up to N = 664385619 
terms. The computation of Q and T was first spanned over 16 distinct 2.4Ghz Opteron 
processors. The cumulative CPU time spent by these processors to compute their fraction 
of the result was 20 hours. These results were gathered to form the final fraction T/Q 
(in factored form) on a single computer in 3 hours. Converting the fraction T and Q to 
integers took 18 minutes. The division took 53 minutes, and the decimal conversion took 
2 hours and 37 minutes (these timings are suboptimal, since GMP does not yet implement 
Newton's division). 

ERROR ANALYSIS. Using the inequality ^j-j- < \ an d a rough bound on a{n), it can 
be shown that Formula (jlj) gives an error of at most 2 _10Ar+21og2 7V+2 , which is less than 
2 -6643856i29 Using a precision of p = 6643856189 bits, we converted T and Q to 

floating-point numbers, divided both, and converted the binary quotient to a decimal 
string of 2 ■ 10 9 + 1 digits, all those operations being made in rounding to nearest mode 
with the MPFR library. An error analysis yields a maximal absolute error of 2 1_p , which 
together with the above truncation error gives a maximal absolute error of io~ 1999999981 . 

Acknowledgement. The authors thank Richard Brent for his clever remarks on a first 
draft of this paper. 
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